The present invention relates to information processing technology, more particularly, to a method and apparatus for preprocessing a plurality of documents for search, a method and apparatus for presenting search result as well as a system for searching documents that comprises these apparatuses.
Nowadays, search engines typically generate snippet of a document obtained by searching by extracting the partial content nearest to query keywords inputted by user, from the document, so as to present it to user as a search result. The snippet can give the user an immediate view about main topic of the document obtained by searching based on the query keywords, and then the user can determine if the document is relevant to his query according to his own requirements. An existing search engine method for generating snippet is called Nearest Words based Snippet Generating Method.
However, the Nearest Words based Snippet Generating Method can not give an overall picture of a document structure to user, and make the generated snippet lose the granularity (hierarchy of the document) information. Especially for long documents, which are very popular in enterprise environments such as learning materials, project whitepaper, the Nearest Words based Snippet Generating Method will generally not give the query user sufficient summary information of the documents obtained by searching, thereby not able to help the user to understand the main content of the documents very quickly.
Therefore, there is a need for a new method for generating snippet and presenting search result, to provide the query user with overviews of documents obtained by searching, so that the user can understand the overall picture of a document very quickly to determine relevance of the document with his query, thereby improving the browsing speed for the search result.